Kriging-based sequential design strategies using fast 
cross-validation techniques with extensions to multi-fidelity 

computer codes 

Loic Le Gratiet ^ and Claire Caniiamela 

Universite Paris Diderot 75205 Paris Cedex 13 

* CEA, DAM, DIP, P-91297 Arpajon, Prance 
October 30, 2012 



1 Abstract 

Kriging-based surrogate models have become very popular during the last decades to ap- 
proximate a computer code output from few simulations. In practical applications, it is very 
common to sequentially add new simulations to obtain more accurate approximations. We 
propose in this paper a method of kriging-based sequential design which combines both the 
error evaluation providing by the kriging model and the observed errors of a Leave-One-Out 
cross-validation procedure. This method is proposed in two versions, the first one selects 
points one at-a-time. The second one allows us to parallelize the simulations and to add 
several design points at-a-time. Then, we extend these strategies to multi-fidelity co-kriging 
models which allow us to surrogate a complex code using fast approximations of it. The main 
advantage of these extensions is that it not only provides the new locations where to perform 
simulations but also which versions of code have to be simulated (between the complex one 
or one of its fast approximations). 

A real multi-fidelity application is used to illustrate the efficiency of the proposed ap- 
proaches. In this example, the accurate code is a two-dimensional finite element model and 
the less accurate one is a one-dimensional approximation of the system. 

Keywords : kriging, co-kriging, sequential design, cross-validation, resource allocations, 
multi- fidelity codes. 



2 Introduction 

Kriging-based surrogate models have become very popular during the last decades to design 
and analyze computer experiments [Sacks et al., 1989] . The reader is referred to the books of 
[Stein, 1999| , [Santner et al., 2003| and [Rasmussen and Williams, 2006[ for more detail about 
kriging models. Usually, in real applications, two stages are performed to surrogate a computer 
code with a kriging model. The first one consists in building a kriging model from simulations 
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coming from an initial experimental design set. Many methods exist to build the initial 
design set, in order to ensure appropriate space filling properties, the reader is referred to 
[Fang et al., 2006] for a non-exhaustive review of them. The second stage consists in adding 
simulations sequentially at new design points which complete the initial one. The selection of 
the new points are usually based on criteria to improve the global accuracy of the kriging model 
and this will be our goal in this paper. To be complete, we mention that sequential kriging 
has also been widely used in optimization (see [Jones et al., 1998] , [Picheny et al., 2012] ) and 



to estimate probabilities of failure [Beet et al., 2012 



Kriging models are a powerful tool to enrich an experimental design set since it provides 
through the kriging variance - also called predictor's Mean Squared Error (MSE) or variance 
of prediction - an estimation of the model MSE. Kriging literature provides lot of criteria usu- 
ally based on the kriging variance for sequentially design the experiments [Sacks et al., 198"9| . 
Furthermore, [Bates et al., 1996[ and [Picheny et al., 2010 propose more efficient criteria by 
considering the Integrated MSE (IMSE). It consists in integrating the mean value of the MSE 
integrated over the input parameter space. We note though that the IMSE can be com- 
putationally expensive to assess, especially when the dimension increases. Although these 
criteria are efficient for many cases, they can suffer from an important flaw when the ac- 
curacy of the kriging model is not homogeneous over the input parameter space. Indeed, 
the kriging variance is determined by the distances between prediction and design points but 
not by the real model errors. To flx this important flaw, we can use the Empirical IMSE 
suggested in [Sacks et al., 198"9 which evaluates the model errors through a test set. Never- 
theless, in a complex computer code framework, it could be too expensive to consider an ex- 
ternal test set and cross-validation (CV) based criteria are more signiflcant. As an illustration 
[Kleijnen and van Beers, 2004[ and [van Beers and Kleijnen, 2008[ combine a bootstrapping 
and a CV procedure to evaluate the predictor's MSE. Although this method improves the 
classical approach, it still does not take into account the real model errors. We note that a 
strength of the method proposed by [Kleijnen and van Beers, 2004"[ is that it can be applied 
to others type of surrogate models than the kriging one. 

The flrst focus of this paper is on sequential design to improve the accuracy of a krig- 
ing model. In particular, we propose new criteria combining the kriging variance and the 
Leave-One-Out CV (LOO-CV) errors. The CV errors allow for focusing the new observations 
on regions where the real model errors are large. Furthermore, thanks to the equations pre- 
sented in [Dubrule, 1983[ , the LOO-CV equations are fast to compute and thus the suggested 
approach is not expensive. 

Kriging has recently been extended to allow for the use of coarse versions of a com- 
plex computer code to improve the accuracy of its approximation. These so-called multi- 
fidelity surrogate models have become of growing interest. A first one was proposed in 
[Craig et al., 199*8 which is based on a linear regression formulation and was improved by 
[Cumming and Goldstein, 2009 through a Bayes linear formulation. The first multi-fidelity 
model using co- kriging was suggested by [Kennedy and O'Hagan, 2000[ . Then, several works 
dealing with this model have been developed [Kennedy and O'Hagan, 2001 , [Higdon et al., 2004 



[Reese et al., 2004| , [Qian and Wu, 2008] . Defining sequential design strategies in a multi- 
fidelity framework is also of interest and is still an open problem. A method based on nested 
Latin hypercube designs is suggested in [Xiong and Qian, 2012] . However, it does not allow 
for adding few simulations (e.g. it cannot perform an one step at-a-time sequential design) 
and it does not take into account the accuracy of the coarse code versions and the time ratios 
between two code levels. 
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The second focus of this paper is on sequential design for co-kriging model. We adapt 
the new strategies suggested for the kriging model to the multi-fidelity co-kriging one. The 
strength of the proposed extensions is that they not only provide the new points where to 
perform new simulations but they also determine which version of code is worth being sim- 
ulated. These new criteria take into account the computational time ratios between code 
versions. They are based on a proxy of the IMSE reduction and on an original result giv- 
ing the contribution of each code on the total variance of the model. We note that se- 
quential design in a multi-fidelity framework has also been applied for optimization purposes 
[Forrester et al., 2007| and [Huang et al., 2006 . 



The paper is organized as follows. First, we introduce the kriging model and present our 
CV-based sequential design strategies. We illustrate these strategies in tabulated functions. 
Secondly, we present the proposed co-kriging multi-fidelity model and the extensions of the 
previous strategies. Finally, we apply the sequential co-kriging approach to a mechanical 
example. 

3 Kriging models and sequential designs 

In this Section, we briefiy introduce the kriging model and some of its classical sequential 
design criteria. Then, we will present our sequential strategies to enhance kriging models 
considering the region with large LOO-CV errors. 

3.1 The Kriging model 

The kriging model is a widely used method to surrogate the output of a computer code from 
few simulations [Sacks et al., 198"9[ . Let us denote by y{x) the output of the code at point 
X & Q. Here, y{x) is a scalar and Q C M*^. Furthermore, we denote by D = {xi, . . . ,Xn} the 
experimental design set and = y(D) the value of y{x) at points in D. 

In a kriging framework, we set that the prior knowledges about the code can be modeled 
by a Gaussian process Yq{x). Usually, we consider a Gaussian process with mean of the 
form mo{x) = i'{x)l3, with f'{x) = . . . , fp{x)) and with covariance function ko{x,x) = 

a^r (x, x; 6). Then, the kriging equations are given by the Gaussian process Yq{x) conditioned 
by its known values y„ at points in D: 

Yn{x) = [Y^{x)\Yo{T>) = yn]-QV{mn{x),K{x,i)) (1) 

where: 

mn{x) = f'(x)/3 + r'(x)R"i(y„, - F/3) (2) 

and: 



kn{x,x) = ^r(x,x) - ( f'(x) r'{x) ) ^ ^ ^ ^ 




(3) 



where ' stands for the transpose, F are the values of f'(x) at points in D, y{x) is the 
correlation vector between D and x with respect to the correlation function r{x,x), R is 
the correlation matrix of D with respect to r(x,x) and /3 = (F'R~-^F)~^F'R~^y„ is the 
usual least-squares estimates of /3. The model parameters cr^ and 9 could be estimated 
by maximizing their Likelihood [Santner et al., 2003 or with a cross-validation procedure 



[Rasmussen and Williams, 2006 . Furthermore, the Maximum Likelihood Estimate (MLE) of 



a2 is given by = (y„ - F/3)'R-i(y„ - F/3)/(n - p). 
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1 point at-a-time Sequential design 

Now, let us suppose that we want to add a new point Xn+i in D in order to enhance the ac- 
curacy of the kriging model. From the kriging variance kn{x, x) - representing the model MSE 
some sequential design methods have been derived [Sacks et al., 1989) , [Bates et al., 1996 



and [Picheny et al., 2010[ . A first one consists in adding Xn+i where the kriging variance is 
the largest (see [Sacks et al., 1989] ): 



argmaxA:„(x,x) (4) 



However, as presented in [Kleijnen and van Beers, 2004 , its performance is poor. Then, it 



has been improved with a criterion which consists in adding the new point which leads the 
most important IMSE reduction (see [Bates et al., 1996| and [Picheny et al., 2010[ ): 



/ kn{u,u) - kn+i{u,u)du (5) 
JueQ 



Xn+i = arg max 

^ Ju&Q 

Here, the covariance kernels kn+i{u,u) corresponds to the one of the Gaussian process Yn{u) 
([1]) conditioned by a new observation at x. Furthermore, the equation Q shows that the 
kriging variance does not depend on the observations if we consider known the parameters 
0"^ and 9. Therefore, in that case, kn+i{u,u) can be computed without new simulations. 
We denote by MinlMSE this criterion. Finally, we also consider the criterion presented by 
[Kleijnen and van Beers, 2004[ using a Jackknife estimator for the predictor's variance. Its 
principle is the following one. Let us consider m„^_j(x) the kriging mean built without the i*'* 
observation, the jackknife variance is given by: 

1 " 

s]ack{^) = —( TT ^iVi - yf (6) 

nin — 1) ^ — ' 

where yi = nmn{x) — [n — l)m„^_j(x) and y = iLi^*/''^- Then, we consider candidate points 
coming from a maximin LHS Design [Fang et al., 2006 and we add those which maximize the 



jackknife variance. We denote by KleiCrit this criterion. 



q points at-a-time Sequential design 

There is a natural way to extend these algorithms when the simulations can be performed 
simultaneously. Indeed, the covariance kernel fc„_|_i(x,x) of the Gaussian process Yn{x) condi- 
tioned by the new observation at point Xn^i can be computed without knowing y{xn+i) when 
we consider the model parameters o"^ and 6 as known. Then, from A;„+i(x,x), we can find a 
new point x„_|_2 where to perform a new simulation using the same criterion as in equation 
([5]) and the kernel kn+2{x,x). Thus, considering the parameters o"^ and 9 as known (they 
are fixed to their estimated values), we can determine with this procedure q good locations 
where to perform simulations. We call this method the liar sequential kriging. This idea is 
also presented in a framework of kriging-based optimization in [Ginsbourger et al., . 

3.2 LOO-CV based strategies for kriging sequential design 

We present in this subsection new sequential- kriging strategies. The main difference between 
these new strategies and the previous ones is that they take into account the real model errors 
through the LOO-CV equations. 
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The proposed sequential methods are based on the following original proposition. It gives 
the closed form expressions of the LOO-CV equations where the model parameters /3 and 
0"^ are re-estimated after each removed point. This result is already known when is fixed 
[Dubrule, 1983| and [Rasmussen and Williams, 2006 . 



Notations: Aj^j is the i element of the main diagonal of A, Aj is the i row of the matrix 
A, A_j is the matrix A without its i*'* row, A_j^j is the i*'* column of A without its i*^ element, 
P^i-i = A'_ - ■ and A_j^_i is the matrix A without the i*'^ row and column. 

Proposition 1 Let us denote by Yn^-i{x) the Gaussian process Yq{x) conditioned by the values 
Yn-i = \ y{xi). Then, the predictive mean of Yn^-i{x) at point Xi is given by: 

mn,-i{xi) = y{xi) - [R-Hyn.-i - F_,/3_i)] 7 (7) 

where /3_, = (F'_,KiF„,)-'F'„,K,y„,_i andl^i = [R"^] - [R"^] / 
Furthermore, the predictive variance of Yn-i{x) at point Xi is given by: 

K,-^ixi) = alj [R-^] ^ ^ + (8) 



where = ([R-^F]^ / [R-^] . .)' (F'_,K,F_,)"i ([R-^Fj^ [R^^; 

al^ = (yn,-i - F.J.,)'Ki fy„,_, - F_.,^„,) /{n-p- 1) 



1,1 



and 



The previous proposition provides a powerful tool to compute the LOO-CV predictive 
means and variances. Indeed, several elements of the equations have been already computed 
during the models construction (e.g. the inverse of the matrix R). Consequently, the LOO-CV 
equations are fast to compute and can be easily recomputed at each step of the sequential 
strategy. We note that the originality of this result is the estimation of c^j. As we use the 
value of kn-i{xi) strongly depending on 0"^^ in our forthcoming developments, it is important 
to well estimate it. 

Now, let us denote by Gloo-cv ~ iivi^i) ~ ^n,-i{xi))^ the vector of the LOO- 

L J i=l,...,n 

CV squared errors and Slqo-cv ~ {^n-i{xi)]^^^ ^ the vector of the LOO-CV variances. 
Furthermore, let us consider the Voronoi cells (Vi)i=i^...^„ associated with the points (xj)i=i^...^„: 

Vi = {x e Q, \\x - Xi\\ < \\x - XjW, Vj / i}, i,j = l,...,n (9) 

In the remainder of this section, we present two strategies to sequentially add simulations which 
use eLoo-cV' ^LOO-CV '^^^ intuitive idea of the suggested criteria is to enhance the 

predictive variance in the locations where the LOO-CV errors are important. 

LOO-CV-based 1 point at-a-time Sequential design 

Let us denote by Xn+i the new point that we want to add to D. We consider the point 
solving the following problem: 



=1 ploo-cvj 



Xn+1 = argmaxA:„(x,x) 1 + ^ ^^LOO cv^ j^^^^^ j ^-^^^ 



5 



This criterion considers the predictor's MSB kn{x,x) adjusted with the LOO-CV errors and 
variances. For equivalent kn{x,x), the criterion favors the points close to an experimental 
design point with large LOO-CV errors. Furthermore, if two points are in the same Voronoi 
cell, the one with the largest predictor's MSB is considered. Therefore, a sequential strategy 
with this criterion focus on the regions of Q where the LOO-CV errors are the largest. We 
note that the standardization with s^oo-cv important since it is not necessary to enlarge 
the predictor's MSB in the regions where it is well or over estimated. 



LOO-CV-based q points at-a-time Sequential design 

We extend here the previous criterion for a q points at-a-time sequential design. First, 
we emphasize that the liar sequential kriging is not relevant for this new criterion. Indeed, 
conditioning on model parameters, with a liar method we can compute the kriging variances 
{kn+i{x,x))i=i^,,,^q but not the LOO-CV equations. Therefore, we use another strategy to 
propose q new locations where to perform the simulations. This approach is proposed in 
IDubourg et al., 2011 in a different framework. The idea of the suggested method is to select 



the q best points with respect to the criterion (fTOj) from N candidate points. These 
candidate points are chosen with the following algorithm. 

1. Generate A^mcmc samples with respect to the probability density function proportional 

to kn{x, x) with a suitable Markov Chain Monte Carlo (MCMC) technique [Robert and Casella, 2004] . 

2. Bxtract from these samples representative points with a iV- means clustering technique 
[MiiZQueen, 1967| . 



As presented in [Dubourg et al., 2011 the use of this algorithm to select A^ candidate 



points in a kriging framework is efficient. Indeed, it allows us to concentrate the points 
at the modes of the kriging variance. In the proposed strategy, we always take N > q 
and we choose from the A^ cluster centers (Cj)i=i,.,^7v the q points where kn^adji^y^) = 
kn{x, x) I 1 -|- y27=^ ' ]l3;gy. I is the largest. For the MCMC procedure, we use a Metropolis- 

Hastings (M-H) algorithm with a Gaussian jumping distribution. It is centered on the last 
sample point and has a standard deviation such that the acceptance rate is around 30% (see 
[Robert and Casella, 2004| ). Furthermore, we set A'^mcmc such that A'^mcmc ^ For the 
A^-means procedure, we choose the value of A^ with respect to the following criterion: 

max min kn{x,x) (11) 

N>qx&iCi)i 

where {Ci)i=i,„^N are the cluster centers. This criterion prevents from having a cluster cen- 
ter in a region where the kriging variance is close to zero. Furthermore, if the number of 
clusters is too high, the cluster centers get away from the modes and consequently the value 
of min^.g((7.),_j ^kn{x,x) decreases. Therefore, this criterion also prevents from having a 
number of clusters too large. 



4 Sequential design in a multi-fidelity framework 

A computer code can often be run at difi'erent levels of accuracy. In this case, it can be worth 
using low-accuracy versions of a code to improve its approximation. Co-kriging models are 
well suited to build such multi-fidelity surrogate models (see [Kennedy and O'Hagan, 2000 
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and IQian and Wu, 2008| ). In this section, we present the considered multi-fidehty co-kriging 
models and we extend the previous sequential design strategies in this framework. We note 
that, in a multi-fidelity framework, the search for the best locations where to run the code is 
not the only point of interest. Indeed, once the best locations are determined, we also have 
to decide which code level is worth being run. This will not only depend on the time-ratios 
between the code levels but also on the contribution of each code level to the total predictor's 
MSE. 



4.1 Multi-fidelity co-kriging models 

Let us suppose that we want to surrogate a computer code output y^{x) and that coarse 
versions of this code (y'(x));=i^,,,^s_i are available. These codes are sorted by order of fidelity 
from the less accurate y^{x) to the most accurate y^~^{x). All code levels are modeled by 
Gaussian processes {Y^{x))i=i^,,,^s with respect to the following relationship with I = 2, . . . , s: 

Y\x) = pi^iY^-^{x) + 6^{x) 

Y^-\x)±5^ix) (12) 

y'-i(x) ~ [y'-i(x)|Y('-i) = yC-i)] 

where = (Y\...,Y0 with Y' = y'(D'), y^'^ = (y\...,y') with y' = y'(D') and 
(D')/=i^ , jj are the experimental design sets at level I with n/ points and such that D* C 
D'*"^ ^ ■ ■ ■ ^ D^- We note that the nested property is not necessary to build the model 
but allows for efficient parameter estimations. Furthermore, conditioning on parameters 
af and 6i, (5'(x) is a Gaussian process of mean ii{x)Pi with f/(2;) = (/{(x), . . . , fp^{x)) and 
covariance kernel afri{x,x,6i). By convention Y^{x) has the same distribution as 6^{x). 

The multi-fidelity co-kriging model at level / = 2, . . . , s is given by the following distribu- 
tion: 

Y^ix) = [Y^{x)\Y(^) =y(^)] gv{f.l^{x),kl^{x,i)) (13) 

with: 
and: 

kl,^{x,x) = pf_^kl;^\{x,i) +af r;(x,x) - ( h[{x) rj(x 

(15) 

where F/ are the values of f/(x) at points in D^, r/(x) is the correlation vector between and 
X with respect to the correlation function ri{x,x), Hi is the correlation matrix of D' with re- 

spect to ri{x,i), ^x) = [^J,7_\(x) f/(x)] and ^^^^ ^ = (H;R-1H;)"1h;R- V with = 
[y'~^(D')F/] is the usual least-squares estimates of ( ]. Furthermore, the restricted 



= Pi~if^i7,\i^) + + r'i{x)RYHy' - F^A - A-iy'"HDO) (14) 




maximum likelihood estimate of af is given by o"^^ = [ y' — { ^ | ] ^ ( y' — H/ ' ^ 



Pl — 1) [Santner et al., 2003 
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Remark. The important property of this co-kriging model is that its MSE (|15p provides 
through the term pf_ikl^^^ the contribution of the code level Z — 1 to the total predictor's 
MSE. Therefore, it can allow us to determine which code level is worth being simulated at a 
new location x. 



4.2 Sequential design for multi-fidelity co-kriging models 

The aim of this subsection is to extend the sequential kriging strategies proposed in Subsection 
l3.2l to the suggested multi-fidelity co-kriging model. These extensions are based on the variance 
decomposition property presented in Subsection 14.11 in equation (jlSp and on the following 
extension of Proposition (T) 

Proposition 2 For I = l,...,s, let us consider the multi-fidelity model il^) with C 
D'^"*^ ^ ••• C D^. We denote by the index of corresponding to the i*^ point x\ ofD^ 
with 1 < j < I- Then, if we note £hOO-CY,liXi) ^he LOO-CV error (i.e. real value minus 
predicted value) at level I and point x\, we have: 



l[^T\,, (16) 



where eLOO-CV,i(a^i) is given by PropositionUl ^ ^^^^ ^ = (H^ .^^K/H; .^J ^H; _^^Kiy'_ 

and = [RT']-,,,.,, - [Rr^]_,,, [Rr^],,_, / 

Furthermore, if we note o"loo-CV /(■^D variance of the LOO-CV, we have: 



C'LOO-CY,li4) = /5f-lf^LOO-CV,i-l(^i) + "^l-^J i^i ^]^,,^^ + "^i (17) 
wtth ,1 = nf (h; _^ K;H;,_^ r\ u; = [Rr'H,]^^ /[^7\,^, ^nd: 



where H/ = [y'_^^ 

Proposition [2] provides closed form expressions for the LOO-CV errors and variances. From 
them, the LOO-CV equations are fast to compute and consequently they can be used in a 
sequential procedure with a low computational cost. Furthermore, since the experimental 
design sets are nested, we state that during the LOO-CV procedure at level the points 
are removed from all code levels. Finally, from these equations, we can adjust the co-kriging 
variances (fe^^ (x, ^));_^ ^ at each level using the same method as presented in equation (flU]) . 

1 point at-a-time sequential co-kriging. First, let us consider Xncw the point solving the 
problem: 

a^ncw = arg max /c^^ (x, x) (19) 
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Therefore, we want to compute a new simulation at point where the predictor's MSE is 
maximal. Now, let us consider two successive code levels / — 1 and /. The question of interest 
is to estimate which of these two code levels is worth being simulated. 

First, thanks to the equation (jlSp . we can deduce the contribution of each code levels to 
the predictor's MSE. Let us define the following notation for / = 2, . . . , s: 




crlix) = af - { h[{x) r;(x))( ^ ) ( )1 (20) 

and (Tgi{x) = kji^{x,x). Then, we have: 

I 

kl,,{x,x) = Y,<^U^)llP^j (21) 

i=l j=i 

Let us consider that the parameters {9i)i=i^...^s define the characteristic length-scales of the 



kernels {{ri{x,x;6i))i=i^,„^s (see Rasmussen and Williams, 2006 p. 83). Then, we can approx- 



imate the reduction of the IMSE after adding a new point Xncw at level I with the following 
formula: 

/ l~i d 

IMSE^,d(^ncw) = E4(^new)n/'i 11 (22) 

i=l j=i m=l 

with 01 = {6f, . . . , of). Indeed, at each stage, cr|i (a^ncw) Yi^'^i P] represents the contribution of 
the bias 5'^{x) to the co-kriging variance and nm=i ^7* represents the volume of influence of 
Xnew at level j. This criterion is justify by the fact that the reduction of IMSE' defined by 
IMSE' = Jg cr^j (x) dx after adding a new point Xnew has the same order of magnitude than 

(^liix-acw) times the volume of influence nm=i °f ^new 

Now, let us consider that the ratio of computational times between the codes y\x) 
and y'~^(x) equals Bi/i_i. It is worth running the code y'~^(x) if i?;//_iIMSE'~J(a:ncw) > 
IMSE[.gj(xnew)) i-e. if the potential uncertainty reduction by running -B;/;_i times y'^^(x) is 
greater than the one when we run one simulation on y\x). From this criterion, we can deduce 
the following algorithm for an one at-a-time sequential co-kriging model taking into account 
both the computational ratios between the different code levels and the contribution of each 
level to the total co-kriging variance. 
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Algorithm 1 One point at-a-time sequential co-kriging 
1: Find 

Xncw such that x-^icw — arg maXj; kf^^(^x, x") 
2: for / = 2, . . . , s do 
3: if (a^ (xnew) < IMSE') then 

4: Run y'~^(Xnew) 

5: end for 

6: else 

7: if (lMSE^-Hxnew)/IMSE[,d(^ncw) > V^z/i-i) then 

8: Run y'"^(Xncw) 

9: end for 

10: end if 

11: end if 
12: end for 
13: if (/ = s) then 
14: Run y'(Xncw) 
15: end if 



Remarks: Algorithm [T] evaluates for two successive code levels / — 1 and I, which one is 
worth being simulated. It starts with the levels one and two, then two and three and so on. 
When it finds that the level / — 1 is more promising than the level /, it stops the loop and 
simulate Xnew at code levels y^{x), . . . Since the loop is defined from level 1 to level 

s, it favors simulations at low code levels. Therefore, it will tend to learn the coarse code 
versions before learning the accurate ones. We note that during the loop of the algorithm [H 
the parameters are not re-estimated. In fact, they are re-estimated after adding the new point 
Xnow Moreover, the first test (j'^i{x,^cw) < IMSE' checks in averaged if the code level / at point 

Xnow is worth being run. Then, the test IMSE^;:^^(2;new)/IMSE[^d(2;new) > l/^V'-i evaluates 
which code levels between I and I — 1 is the most promising. Finally, if we consider that the 
code level I is more promising than the code level Z — 1, we confront it to the following code 
level I + 1. We note that the algorithm [1] is reiterated until a prescribed accuracy is reached 
or the computational time budget is spent. 

1 point at-a-time sequential co-kriging with adjusted predictor's MSE. From 
Proposition [21 Algorithm [1] and Equation ()22p . we can extend the criterion (jlOp to the multi- 
fidelity co-kriging model. Let us consider the following quantity: 

(23) 

with the convention po = and {pi)i=i^,,,^i is given by Proposition [2l In equation (j23p . the krig- 
ing variances (^^^(x) in equation (f2T]) is replaced with the adjusted kriging variance presented 

in Subsection 13.21 We note that (^£hOO-CY ,iix)) — Pi-i^hOO-CY is the part of the 

LOO-CV squared error explained by the bias 6^{x) and "^loo— CV ~ /'f— i'-''loo— CV i— 

is the corresponding LOO-CV predictive variance. To adapt the adjusted co-kriging variance 

in a multi-fidelity framework, we just have to replace IMSE|,g^(x) with IMSE'^^ j^^- (x) in the 



10 



algorithm [TJ 



(q')i=i^,,,^s points at-a-time sequential co-kriging. In this paragraph, we propose an 
extension for the multi-fidelity model of the q points at-a-time sequential design presented 
in Subsection 13.21 Its principle is the following one. First, we select new points for the 
code with the method presented in Subsection 13.21 "LOO-CV based q points at-a-time 

Sequential design". Then, we consider these points as known for the code and we 

select q^~^ new points for this code with the same method. We note that, as presented in 
Subsection 13.11 we can use a liar method to compute the new co-kriging variance without 
simulating y''~^{x) at the new points. Finally, we repeat this procedure for all code levels 
from y^~'^{x) to y^{x). At the end of the procedure, we have 9* new points at level j and 

we want to find the allocation {q^ , . . . leading the largest potential uncertainty reduction 
and under the constraint of a constant CPU time budget. We note the CPU time budget 
T = Yl'^j=iYli=j 1^^'' where (t*)i=i,...,s represents the CPU times of codes (y*(a;))i=i,...,s. The 
algorithm [2] presents the suggested q points at-a-time sequential co-kriging. 

Algorithm 2 points at-a-time sequential co-kriging 

1: Set the budget T > and the allocation {g^, . . . , q''} such that X^j=i Yli=j 9^^'^ ~ ^ 

2: Set (-^M(-;j^(-;)j=i^...^/ for the M-H procedures. 

3: Generate -/V^^p^Q samples with respect to A;^^(x,x). 

4: Find the A^' cluster centers (C')j=;^^ such that A^' = maxjy^^; min^g^,^!^. kl^^{x,x) 

5: Select from (C-)j^]^^ the q'- points (a^ncw,i)«=iv,'?i where A:^; x) is the largest. 

6: for m = / — 1, . . . , 1 do 

7: Compute k"^ , i{x,x) with the new points ( (a^^ncw i)«=iv..,'?i ) 

"m+z^i^m^+i (? \ ' / j=m.+l,...,l 

8: Generate A^JJqp^q samples with respect to /c™ ; .(x,x). 

9: Find the N"^ cluster centers {Cl^)i=i,...,N"' such that N"^ = 
m&XN>qm min^g(c™), k"" , , (x, x) 

10: Select from (C™)i=i,...,7vm the points (x™„ i)i=i,...,g„ where k"" , (x, x) 

' ™m+2^i=^+i q ,aaj 

is the largest. 
11: end for 

In Algorithmic k^ i (x, x) corresponds to the kernel of the Gaussian process (x) 

conditioned by the observations ( (^^ncw i)*=i. ) when the parameters (o"?)j=i^...^; 

V ' ' ' / j=l+l,...,s ' ' 

and {6i)i=i^,,,^i are considered as known (i.e. this corresponds to a liar method). Furthermore, 
qi adj('^''^) corresponds to the predictor's variance A;^^_|_^s ^i(x,x) adjusted with 
the LOO-CV errors and variances: 



(24) 

where k^ , v-s , and o"?, , v-a , (xnpw) are deduced from the equation (1151) . We note that 
for the M-H procedures, we use a Gaussian jumping distribution with a standard deviation 
such that acceptance rate is around 30%. 
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Furthermore, let us consider the following quantity 

I l~l d 

IMSE,ed,q = E E 4Kew,r)n'°'n^r (25) 
4=1 r=l,...,q^ j=i m=l 

We consider the allocation {q^, ... which solves the following optimization problem: 

I I 

{q^ , ■ ■ ■ ,q''} = arg max IMSEred q such that > > qH^ = T (26) 

i.e. we look for the allocation leading the maximal uncertainty reduction. This optimiza- 
tion problem is very complex to solve and a sub-optimal solution will be often considered. 
Nevertheless, when the number of code levels and the budget T are low (e.g. s = 2 in our 
application) an exhaustive exploration of the allocation {q^, ...,(/'} can be performed. We are 
in that case in the presented application . Furthermore, we note that IMSErcd,q is a proxy on 

the IMSE reduction when we add ( (a;^^ j)j=i,...,q„ ) at code levels {y"^{x))m=i,...,i- 

V ' ' ' / m=l,...,l ' ' 

In practical application, the algorithm [2] is reiterated until we reach a prescribed precision 
or the computational time budget is exhausted. 



5 Applications 

We compare in this Section the MinlMSE, KleiCrit and AdjMMSE criteria on toy examples 
and on an application concerning a spherical tank under pressure. We present both the cases 
of 1 point at-a-time and q points at-a-time sequential kriging. Then, we compare on the 
tank application, the suggested sequential kriging and co-kriging methods with s = 2 levels. 
The purpose of this section is to emphasize the efficiency of the LOO-CV-based criteria and 
to highlight that a multi-fidelity analysis can be worthwhile. Finally, for the multi-fidelity 
sequential co-kriging, we present the allocation of the simulations between the coarse code 
and the accurate one. We note that for the different examples, we compare the different 
methods given a prescribed computational time budget. 



5.1 Comparison between sequential kriging criteria 

In this subsection, the 1 point at-a-time sequential kriging criteria (MinlMSE, KleiCrit, Ad- 
jMMSE) are compared on three tabulated functions: 

• Ackley's function on [-2, 2]^ [Ackley, 19871 : 

/ I x"^ + /cos(27ra;) -I- cos(27r'y)\ , , 

fix, y) = -20exp I -0.2^ J - exp i-^ j + 20 + exp(l) 

• Shubert's function on [-2,2]^ |Xian, 2001| : 

f{x, y) = + + ^E ^'^^^ + + 

• Michalewicz's function on [0, vr]^ [Michalewicz, 1992 : 
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f{x,y) = - 




The comparison is performed on a test set Dtost composed of ntest = 1000 points uniformly 
spread on the input parameter space and from 50 different initial experimental design sets. 
We compare the different methods with respect to the Normalized RMSE: 

Nonn RMSE = V^S' - V..(^)) V'W 

maXa;gDtcBt yrcai(a;) - mm^eDtcst 2/rcai(a;) 

where yrea.i{x) is the real value of the output and yprcdC^^) th^ predicted one. The initial exper- 
imental design sets are LHS designs of 10 points optimized with respect to the S-optimality 
[Stocki, 2005| . From these designs, 50 sequential krigings are performed and the convergence 
of the mean and the quantiles of the Normalized RMSE are computed for the three criteria. 
Furthermore, after each added point, the parameters of the kriging model are re-estimated 
with a maximum likelihood method. These estimations are performed thanks to the R library 
'DiceKriging' [Roustant et al., 2012| . 

Figure [T] illustrates the efficiency of the criterion AdjMMSE. Indeed, for the Shubert's 
and Michalewicz's functions, we see that the accuracy of the 1 point at-a-time kriging with 
this criterion is significantly better than the one of the others criteria (both in terms of 
mean and quantiles of the Normalized RMSE). In fact, these functions have the particularity 
to have non-stationarity in some areas of the input parameter space. Thus, the errors are 
more important in these locations and the suggested criterion focus the new points on it. 
Furthermore, the non-stationarity is particularly important for the Shubert's function. For 
this reason, the IMSE criterion performed very bad in that case. Indeed, this criterion is 
efficient for stationary functions (i.e. when the predictor's MSE well predicts the real model 
errors). In contrast, the jackknife predictor's MSE provided by the criterion KleiCrit manages 
to catch this non-stationarity and it performs better than the IMSE criterion. Moreover, we 
see that the performance of the AdjMMSE and IMSE criteria are equivalent for the Ackley's 
function. We note that the Ackley's function is perfectly stationary and the errors have thus 
the same order of magnitude over the input parameter space. 

These examples illustrate the fact that our criterion is more efficient than the other criteria 
when the functions are non-stationary and it remains efficient even in the cases where the 
functions are stationary (its efficiency is equivalent to the one of the IMSE criterion). 

5.2 Spherical tank under internal pressure example 

In this section, we deal with an example about a spherical tank under internal pressure. We 
are interested in the von Mises stresses on the three points labeled in figure [2l Indeed, we 
want to prevent from material yielding which occurs when the von Mises stress reaches the 
critical yield strength. 

The system illustrated in figure [2] depends on the following parameters: 

• P{MPa) G [30,50]: the value of the internal pressure. 

• Rint {mm) G [1500, 2500]: the length of the internal radius of the shell. 

• Tgh(,ii {mm) G [300, 500]: the thickness of the shell. 
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• Tcapirnm) G [100,300]: the thickness of the cap. 

hell (GPa) € [63, 77]: the Young's modulus of the sheU material. 

• Ecap {GPa) G [189, 231]: the Young's modulus of the cap material. 

• (Jy^shell (MPa) € [200,300]: the yield stress of the cap material. 

• CTy^cap (MPa) € [400,800]: the yield stress of the cap material. 

The accurate code output is the value of the von Mises stress provided by an Aster 



finite elements code (http://www.code-aster.orgl modelling the system presented in figure 



El We use the notation x — (^P, RintjTsjieiijTcap, Eghgn, E^ap^ o'y^sheiij (^y,cap) ■ We note that the 
material properties of the shell correspond to high quality aluminum and the ones of the cap 
corresponds to steel from classical to high quality. Then, the coarse code output y^{x) is the 
value of the von Mises stress given by the ID simplification of the tank (|28p (it corresponds 
to a perfect spherical tank under pressure, i.e. without cap). 



\3 



2 {Rint + Tshell) — R% 



int 



According to equation ()28p . the actual input dimension of y^{x) is three (it depends only on 
P, Rint and Tghell) while a sensitivity analysis performed with a Sobol decomposition gives 
that the accurate code depends essentially on four parameters (P, Rmt, Tshell and Tcap)- 
Furthermore, the response is highly stationary. Therefore, only few points are necessary to 
well predict the output of the code. For these reasons, we can start the sequential strategies 
from an initial experimental design set with only 10 points. 

Thus, for the different comparisons, we use a S-optimal LHS design of 10 points for the 
code y^(x). For the coarse code y^{x), we start with a design of 20 points. It is created 
with the following procedure. First, we create a S-Optimal design of 20 points. Second, 
we remove from the 10 points that are the closest to those of D^. Finally, is the 
concatenation of and (this procedure ensures the nested property C D^). We note 
that the CPU time is around 1 minute for the accurate code and 10~^ seconds for the coarse 
code. Nevertheless, to be in a more realistic case, we consider that the CPU time ratio between 
y'^{x) and y^{x) equals -B2/1 = 10- Furthermore, each sequential procedure is performed with 
40 different initial design sets. Then, the mean and the quantiles of probabilities 90% and 
10% of the empirical Normalized MSB are computed from a test set composed of 1000 points 
uniformly spread on the input parameter space. Finally, for the M-H procedure, we use 
a Gaussian jumping distribution such that the acceptance rate is around 30% and we set 
A^MCMC = 50000 (we use 5 000 samples for the the burn-in procedure of the M-H method, 
see [Robert and Casella, 2004] ). For the M-H procedure, we use the package R CRAN mcmc. 
We note that after each added points, the parameters of the kriging or co-kriging models are 
re-estimated with a maximum likelihood method. 

The remainder of this section is organized as follows. First we compare the MSE of the 1 
point at-a-time sequential kriging with the one of the q = 5 points at-a-time one. Second, we 
compare for a given CPU time budget the sequential kriging and cokriging strategies. In the 
forthcoming developments, the response i = 1,2,3 refers to the value of the von Mises stress 
at point i on figure [2j 
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5.2.1 Comparison between sequential kriging criteria 

Figure [3] compares the different criteria of tlie 1 point at-a-time and tlie q = 5 points at-a-time 
sequential kriging. We see tliat the criteria MinlMSE and AdjMMSE give equivalent values 
for the MSE for the 1 point at-a-time procedure and they perform better than the KleiCrit 
criterion. There are equivalent since the output is perfectly stationary. Nevertheless, 

the criterion AdjMMSE is the most efficient for the g = 5 points at-a-time procedure. Indeed, 
the 5 points provided by a liar method with the MinlMSE criterion are not necessarily those 
which maximize the reduction of the IMSE. The method suggested in section 14.21 seems to 
give better solution. 

5.2.2 Comparison between kriging and co-kriging sequential analysis 

In this section, we compare the sequential kriging strategy with the sequential co-kriging with 
respect to the AdjMMSE criterion. Figure U gives the convergence of the empirical normalized 
MSE for the response 1 . We see that the sequential co-kriging performs better than the kriging 
one. Furthermore, at the beginning of the method, the proportion of runs for the accurate 
code is very low. Indeed, the coarse code and the accurate code are extremely correlated for 
this response (around 99%) and thus, during the sequential strategy, the bias between the 
two codes is well estimated. Then, when the coarse code is well approximated, the sequential 
strategy starts to run the accurate one (for a CPU time around 500). 

Figure [S] gives the convergence of the errors for the response 2. For this response, the 
correlation between the coarse and the accurate code is around 80%. Therefore, the proportion 
of runs for the accurate code determined by the sequential strategy is more important than 
in Figure HI Furthermore, we see that this proportion increases with the CPU time. It means 
that the sequential co-kriging improves the approximation of the coarse code at the beginning 
of the procedure and then focuses on the accurate code. As a result, we see that the sequential 
co-kriging strategy is substantially better than the kriging one. 

Figures U] and [5] illustrate the efficiency of the sequential co-kriging when the coarse code 
bring information on the accurate code. For the response 3, the coarse code is weakly correlated 
with the accurate code (around 45%). This is due to the fact that the coarse code models 
the von Mises stress in a perfect spherical tank whereas the response 3 corresponds to the 
one in the cap. Figure [U] shows that in this case, the sequential co-kriging model manages to 
determine that the coarse code is not worth being simulated. Indeed, the proportion of runs 
for the accurate code is very low. Furthermore, it shows that the co-kriging sequential design 
performs as well as the kriging one when the coarse code is non-informative. 

Finally, Figure [7] shows the efficiency of the {q^,q'^) at-a-time sequential co-kriging. We 
set in the algorithm [2] that T = q^ + q'^ + lOg^ = 120 where the CPU time of the coarse code 
is 1 and the one of the accurate code is 10. For the the sequential kriging, we use a g = 10 
at-a-time sequential procedure. Furthermore, Figure [7] shows that at the beginning of the 
procedure, the sequential co-kriging focuses on the approximation of the coarse code whereas 
at the end the number of runs for the accurate code is maximal. We note that the allocation 
of runs for the accurate code in figure [7] agrees with the proportion of runs given in Figure [5j 

The results of the sequential co-kriging on the different responses show that the criterion 
suggested in section 14.11 performs very well. Indeed, it is always better than the sequential 
kriging when the coarse code is informative and its performance is equivalent to it when the 
coarse code is not useful. Furthermore, the different proportions of runs for the accurate code 
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emphasizes that the criterion accurately determines the contribution of each code to the total 
model error and the optimal run allocation between the accurate and the coarse codes. 

6 Conclusion 

This paper deals with sequential strategies for kriging and co-kriging models. The kriging 
model is used to approximate the output of a complex computer code and the co-kriging one 
allows to improve this approximation thanks to coarse versions of the code. 

First, we have presented classical sequential criteria for the kriging model and we have 
suggested another criterion based on the Leave-One-Out cross validation errors. This criterion 
has allowed us to set the new observations at locations where the model error is important. 
The examples presented in the last section have highlighted the efficiency of the suggested 
criterion. Indeed, for non-stationary functions, it provides results significantly better than 
classical criteria and for stationary ones its performance is equivalent to them. We have also 
emphasized the performance of the suggested criterion on a real application. Furthermore, we 
show in the application that when the simulations can be performed in parallel, our method 
has performed better. 

Second, we have presented the extension of our criterion to multi-fidelity co-kriging models. 
We have shown in the application that performing a multi-fidelity sequential co-kriging is 
worthwhile when the coarse code versions are informative (i.e. highly correlated with the 
accurate code). Furthermore, a strength of the proposed approach is that it performs as 
well as a sequential kriging when the coarse code versions are not informative. In fact, the 
proposed extension takes into account the contribution of each code to the total predictor's 
mean squared errors and it determines the best run allocation between accurate and coarse 
code versions given a CPU time budget. 
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Figure 1: Comparison between 1 point at-a-time sequential kriging criteria on toy examples. 
The bold triangles represent the mean of the empirical MSB for the AdjMMSE criterion, the 
bold circles represent it for the MinlMSE criterion and the bold Crosses represent it for the 
KleiCrit criterion. Furthermore, the thin triangles, circles and crosses represent the quantiles 
of probabilities 10% and 90% of the empirical MSE. 
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Figure 2: Scheme of the spherical tank under pressure. 
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Figure 3: Comparison between 1 point at-a-time sequential kriging criteria (on left) and 
g = 5 points at-a-time sequential kriging criteria (on right) on the spherical tank example. 
The bold triangles represent the mean of the empirical MSE for the AdjMMSE criterion, the 
bold circles represent it for the MinlMSE criterion and the bold Crosses represent it for the 
KleiCrit criterion. Furthermore, the thin triangles, circles and crosses represent the quantiles 
of probabilities 10% and 90% of the empirical MSE. 



20 



Response 1 



Reponse 1 





— Sequential kriging 




Sequential co-kriging 


s 

\ 

\ 


\ 







100 



200 



1000 



500 1000 100 200 500 

CPU time CPU time 

Figure 4: Comparison between 1 point at-a-time sequential kriging and co-kriging on the 
response 1 of the spherical tank example with respect to the AdjMMSE criterion (on the left). 
The thick dashed line represents the mean of the empirical MSE for the sequential kriging 
and the thick solid line represents it for the sequential co-kriging. The thin lines represent the 
quantiles of probabilities 10% and 90% of the empirical MSE. Figure on the right represents 
the proportion of runs allocated to the accurate code. 
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Figure 5: Comparison between 1 point at-a-time sequential kriging and co-kriging on the 

response 2 of the spherical tank example with respect to the AdjMMSE criterion (on the left). 
The thick dashed line represents the mean of the empirical MSE for the sequential kriging 
and the thick solid line represents it for the sequential co-kriging. The thin lines represent the 
quantiles of probabilities 10% and 90% of the empirical MSE. Figure on the right represents 
the proportion of runs allocated to the accurate code. 
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Figure 6: Comparison between 1 point at-a-time sequential kriging and co-kriging on the 
response 3 of the spherical tank example with respect to the AdjMMSE criterion (on the left). 
The thick dashed line represents the mean of the empirical MSE for the sequential kriging 
and the thick solid line represents it for the sequential co-kriging. The thin lines represent the 
quantiles of probabilities 10% and 90% of the empirical MSE. Figure on the right represents 
the proportion of runs allocated to the accurate code. 
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Figure 7: Comparison between q = 10 points at-a-time sequential kriging and (g^, g^) points at- 
a-time sequential co-kriging. On the left, the bold circles represents the mean of the empirical 
MSE for the sequential kriging and the bold triangles represent the one of the sequential co- 
kriging. Furthermore, the thin circles and triangles represent the quantiles of probabilities 
10% and 90% of the empirical MSE. On the right, the squares represent the median number 
of runs for the coarse code during the sequential co-kriging and the triangles represent it for 
the accurate code. 
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